Dependency Relations and Dependency Distance - a statistical view based on Treebank

نویسنده

  • Haitao Liu
چکیده

The dependency relation is the most essential ingredient in a dependency-based theory of syntax. This paper presents some statistical findings on the dependency relation extracted from a Chinese dependency treebank. A sentence in the proposed treebank can easily be converted into a SSyntS graph in Meaning-Text Theory. The statistics on the dependency relation show that modifiers make up 55% of all dependencies and actants have a lower proportion of 45%. The paper demonstrates it is possible to extract from the treebank active and passive valence information of a word (or word class). The paper gives a formula to calculate the mean dependency distance (MDD) for a specific type of dependency relation in a language and obtains MDD of all dependency types in Chinese. These figures show that some dependencies tend to be much farther apart than others, and demonstrate that dependency distance tends to minimization and different dependency types have varying preference on the direction of dependency.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

تبدیل خودکار درخت‌بانک وابستگی فارسی به درخت‌بانک سازه‌ای

There are two major types of treebanks: dependency-based and constituency-based. Both of them have applications in natural language processing and computational linguistics. Several dependency treebanks have been developed for Persian. However, there is no available big size constituency treebank for this language. In this paper, we aim to propose an algorithm for automatic conversion of a depe...

متن کامل

Feature Engineering in Persian Dependency Parser

Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...

متن کامل

A Statistical Constraint Dependency Grammar (CDG) Parser

CDG represents a sentence’s grammatical structure as assignments of dependency relations to functional variables associated with each word in the sentence. In this paper, we describe a statistical CDG (SCDG) parser that performs parsing incrementally and evaluate it on the Wall Street Journal Penn Treebank. Using a tight integration of multiple knowledge sources, together with distance modeling...

متن کامل

Statistical Dependency Parsing for Turkish

This paper presents results from the first statistical dependency parser for Turkish. Turkish is a free-constituent order language with complex agglutinative inflectional and derivational morphology and presents interesting challenges for statistical parsing, as in general, dependency relations are between “portions” of words – called inflectional groups. We have explored statistical models tha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007